Comparison between the stochastic search variable selection and the least absolute shrinkage and selection operator for genome-wide association studies of rheumatoid arthritis
نویسندگان
چکیده
BACKGROUND Because multiple loci control complex diseases, there is great interest in testing markers simultaneously instead of one by one. In this paper, we applied two model selection algorithms: the stochastic search variable selection (SSVS) and the least absolute shrinkage and selection operator (LASSO) to two quantitative phenotypes related to rheumatoid arthritis (RA). RESULTS The Genetic Analysis Workshop 16 data includes 2,062 unrelated individuals and 545,080 single-nucleotide polymorphism markers from the Illumina 550 k chip. We performed our analyses on the cases as the quantitative phenotype data was not provided for the controls. The performance of the two algorithms was compared. Using sure independence screening as the prescreening procedure, both SSVS and LASSO give small models. No markers are identified in the human leukocyte antigen region of chromosome 6 that was shown to be associated with RA. SSVS and LASSO identify seven common loci, and some of them are on genes LRRC8D, LRP1B, and COLEC12. These genes have not been reported to be associated with RA. LASSO also identified a common locus on gene KTCD21 for the two phenotypes (marker rs230662 and rs483731, respectively). CONCLUSION SSVS outperforms LASSO in simulation studies. Both SSVS and LASSO give small models on the RA data, however this depends on model parameters. We also demonstrate the ability of both LASSO and SSVS to handle more markers than the number of samples.
منابع مشابه
Combining least absolute shrinkage and selection operator (LASSO) and principal-components analysis for detection of gene-gene interactions in genome-wide association studies
Variable selection in genome-wide association studies can be a daunting task and statistically challenging because there are more variables than subjects. We propose an approach that uses principal-component analysis (PCA) and least absolute shrinkage and selection operator (LASSO) to identify gene-gene interaction in genome-wide association studies. A PCA was used to first reduce the dimension...
متن کاملModel Selection Methods for Genome Wide Association Studies∗
Due to the multiple loci control nature of complex phenotypes, there is great interest to test markers simultaneously instead of one by one. In this paper, we compare three model selection methods for genome wide association studies using simulations: the Stochastic Search Variable Selection (SSVS), the Least Absolute Shrinkage and Selection Operator (LASSO) and the Elastic Net. We also apply t...
متن کاملDetecting single-nucleotide polymorphism by single-nucleotide polymorphism interactions in rheumatoid arthritis using a two-step approach with machine learning and a Bayesian threshold least absolute shrinkage and selection operator (LASSO) model
The objective of this study was to detect interactions between relevant single-nucleotide polymorphisms (SNPs) associated with rheumatoid arthritis (RA). Data from Problem 1 of the Genetic Analysis Workshop 16 were used. These data consisted of 868 cases and 1,194 controls genotyped with the 500 k Illumina chip. First, machine learning methods were applied for preselecting SNPs. One hundred SNP...
متن کاملThe Pattern of Linkage Disequilibrium in Livestock Genome
Linkage disequilibrium (LD) is bases of genomic selection, genomic marker imputation, marker assisted selection (MAS), quantitative trait loci (QTL) mapping, parentage testing and whole genome association studies. The Particular alleles at closed loci have a tendency to be co-inherited. In linked loci this pattern leads to association between alleles in population which is known as LD. Two metr...
متن کاملApplication of an iterative Bayesian variable selection method in a genome-wide association study of rheumatoid arthritis
Genome-wide association studies usually involve several hundred thousand of single-nucleotide polymorphisms (SNPs). Conventional approaches face challenges when there are enormous number of SNPs but a relatively small number of samples and, in some cases, are not feasible. We introduce here an iterative Bayesian variable selection method that provides a unique tool for association studies with ...
متن کامل